Goto

Collaborating Authors

 Laguna Hills


A Concise Review of Hallucinations in LLMs and their Mitigation

Pulkundwar, Parth, Dhanawade, Vivek, Yadav, Rohit, Sonkar, Minal, Asurlekar, Medha, Rathod, Sarita

arXiv.org Artificial Intelligence

Abstract--Traditional language models face a challenge from hallucinations. Their very presence casts a large, dangerous shadow over the promising realm of natural language processing. It becomes crucial to understand the various kinds of hallucinations that occur nowadays, their origins, and ways of reducing them. This document provides a concise and straightforward summary of that. It serves as a one-stop resource for a general understanding of hallucinations and how to mitigate them. In the fast-moving world of Natural Language Processing (NLP) today, large language models (LLMs) such as GPT, BERT, and others have become the principal agents of change in natural language processing. They can generate human-like text, answer multifaceted questions, or engage in conversation with as much fluency.



Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real-world Forecasting Task

Schumann, Hans, Rosenberg, Louis, Mani, Ganesh, Willcox, Gregg

arXiv.org Artificial Intelligence

Hyperchat AI is a novel agentic technology that enables thoughtful conversations among networked human groups of potentially unlimited size. It allows large teams to discuss complex issues, brainstorm ideas, surface risks, assess alternatives and efficiently converge on optimized solutions that amplify the group's Collective Intelligence (CI). A formal study was conducted to quantify the forecasting accuracy of human groups using Hyperchat AI to conversationally predict the outcome of Major League Baseball (MLB) games. During an 8-week period, networked groups of approximately 24 sports fans were tasked with collaboratively forecasting the winners of 59 baseball games through real-time conversation facilitated by AI agents. The results showed that when debating the games using Hyperchat AI technology, the groups converged on High Confidence predictions that significantly outperformed Vegas betting markets. Specifically, groups were 78% accurate in their High Confidence picks, a statistically strong result vs the Vegas odds of 57% (p=0.020). Had the groups bet against the spread (ATS) on these games, they would have achieved a 46% ROI against Vegas betting markets. In addition, High Confidence forecasts that were generated through above-average conversation rates were 88% accurate, suggesting that real-time interactive deliberation is central to amplified accuracy.


Provenance Networks: End-to-End Exemplar-Based Explainability

Kayyam, Ali, Gopal, Anusha Madan, Lewis, M. Anthony

arXiv.org Artificial Intelligence

We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability. Unlike conventional post-hoc methods, provenance networks learn to link each prediction directly to its supporting training examples as part of the model's normal operation, embedding interpretability into the architecture itself. Conceptually, the model operates similarly to a learned KNN, where each output is justified by concrete exemplars weighted by relevance in the feature space. This approach facilitates systematic investigations of the trade-off between memorization and generalization, enables verification of whether a given input was included in the training set, aids in the detection of mislabeled or anomalous data points, enhances resilience to input perturbations, and supports the identification of similar inputs contributing to the generation of a new data point. By jointly optimizing the primary task and the explainability objective, provenance networks offer insights into model behavior that traditional deep networks cannot provide. While the model introduces additional computational cost and currently scales to moderately sized datasets, it provides a complementary approach to existing explainability techniques. In particular, it addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors, thereby improving transparency, robustness, and trustworthiness in neural models.


15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning

Berg, Andrew P., Zhang, Qian, Wang, Mia Y.

arXiv.org Artificial Intelligence

As unmanned aerial vehicles (UAVs) become increasingly prevalent in both consumer and defense applications, the need for reliable, modality-specific classification systems grows in urgency. This paper addresses the challenge of data scarcity in UAV audio classification by expanding on prior work through the integration of pre-trained deep learning models, parameter-efficient fine-tuning (PEFT) strategies, and targeted data augmentation techniques. Using a custom dataset of 3,100 UAV audio clips (15,500 seconds) spanning 31 distinct drone types, we evaluate the performance of transformer-based and convolutional neural network (CNN) architectures under various fine-tuning configurations. Experiments were conducted with five-fold cross-validation, assessing accuracy, training efficiency, and robustness. Results show that full fine-tuning of the EfficientNet-B0 model with three augmentations achieved the highest validation accuracy (95.95), outperforming both the custom CNN and transformer-based models like AST. These findings suggest that combining lightweight architectures with PEFT and well-chosen augmentations provides an effective strategy for UAV audio classification on limited datasets. Future work will extend this framework to multimodal UAV classification using visual and radar telemetry.



Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, Malte Christian, Banerjee, Debayan, Usbeck, Ricardo

arXiv.org Artificial Intelligence

This paper investigates whether state-of-the-art Large Language Models (LLMs) can automatically translate SPARQL between popular Knowledge Graph (KG) schemas. We focus on translations between the DBpedia and Wikidata KG, and later on DBLP and OpenAlex KG. This study addresses a notable gap in KG interoperability research by rigorously evaluating LLM performance on SPARQL-to-SPARQL translation. Two benchmarks are assembled, where the first align 100 DBpedia-Wikidata queries from QALD-9-Plus; the second contains 100 DBLP queries aligned to OpenAlex, testing generalizability beyond encyclopaedic KGs. Three open LLMs: Llama-3-8B, DeepSeek-R1-Distill-Llama-70B, and Mistral-Large-Instruct-2407 are selected based on their sizes and architectures and tested with zero-shot, few-shot, and two chain-of-thought variants. Outputs were compared with gold answers, and resulting errors were categorized. We find that the performance varies markedly across models and prompting strategies, and that translations for Wikidata to DBpedia work far better than translations for DBpedia to Wikidata.


Using Mobile AR for Rapid Feasibility Analysis for Deployment of Robots: A Usability Study with Non-Expert Users

Zielinski, Krzysztof, Tadeja, Slawomir, Blumberg, Bruce, Kjærgaard, Mikkel Baun

arXiv.org Artificial Intelligence

Automating a production line with robotic arms is a complex, demanding task that requires not only substantial resources but also a deep understanding of the automated processes and available technologies and tools. Expert integrators must consider factors such as placement, payload, and robot reach requirements to determine the feasibility of automation. Ideally, such considerations are based on a detailed digital simulation developed before any hardware is deployed. However, this process is often time-consuming and challenging. To simplify these processes, we introduce a much simpler method for the feasibility analysis of robotic arms' reachability, designed for non-experts. We implement this method through a mobile, sensing-based prototype tool. The two-step experimental evaluation included the expert user study results, which helped us identify the difficulty levels of various deployment scenarios and refine the initial prototype. The results of the subsequent quantitative study with 22 non-expert participants utilizing both scenarios indicate that users could complete both simple and complex feasibility analyses in under ten minutes, exhibiting similar cognitive loads and high engagement. Overall, the results suggest that the tool was well-received and rated as highly usable, thereby showing a new path for changing the ease of feasibility analysis for automation.


Convolutional Fourier Analysis Network (CFAN): A Unified Time-Frequency Approach for ECG Classification

Jeong, Sam, Kim, Hae Yong

arXiv.org Artificial Intelligence

Machine learning has transformed the classification of biomedical signals such as electrocardiograms (ECGs). Advances in deep learning, particularly convolutional neural networks (CNNs), enable automatic feature extraction, raising the question: Can combining time- and frequency-domain attributes enhance classification accuracy? To explore this, we evaluated three ECG classification tasks: (1) arrhythmia classification, (2) identity recognition, and (3) apnea detection. We initially tested three methods: (i) 2-D spectrogram-based frequency-time classification (SPECT), (ii) time-domain classification using a 1-D CNN (CNN1D), and (iii) frequency-domain classification using a Fourier transform-based CNN (FFT1D). Performance was validated using K-fold cross-validation. Among these, CNN1D (time only) performed best, followed by SPECT (time-frequency) and FFT1D (frequency only). Surprisingly, SPECT, which integrates time- and frequency-domain features, performed worse than CNN1D, suggesting a need for a more effective time and frequency fusion approach. To address this, we tested the recently proposed Fourier Analysis Network (FAN), which combines time- and frequency-domain features. However, FAN performed comparably to CNN1D, excelling in some tasks while underperforming in others. To enhance this approach, we developed the Convolutional Fourier Analysis Network (CFAN), which integrates FAN with CNN. CFAN outperformed all previous methods across all classification tasks. These findings underscore the advantages of combining time- and frequency-domain features, demonstrating CFAN's potential as a powerful and versatile solution for ECG classification and broader biomedical signal analysis


Foundation Models for CPS-IoT: Opportunities and Challenges

Baris, Ozan, Chen, Yizhuo, Dong, Gaofeng, Han, Liying, Kimura, Tomoyoshi, Quan, Pengrui, Wang, Ruijie, Wang, Tianchen, Abdelzaher, Tarek, Bergés, Mario, Liang, Paul Pu, Srivastava, Mani

arXiv.org Artificial Intelligence

Methods from machine learning (ML) have transformed the implementation of Perception-Cognition-Communication-Action loops in Cyber-Physical Systems (CPS) and the Internet of Things (IoT), replacing mechanistic and basic statistical models with those derived from data. However, the first generation of ML approaches, which depend on supervised learning with annotated data to create task-specific models, faces significant limitations in scaling to the diverse sensor modalities, deployment configurations, application tasks, and operating dynamics characterizing real-world CPS-IoT systems. The success of task-agnostic foundation models (FMs), including multimodal large language models (LLMs), in addressing similar challenges across natural language, computer vision, and human speech has generated considerable enthusiasm for and exploration of FMs and LLMs as flexible building blocks in CPS-IoT analytics pipelines, promising to reduce the need for costly task-specific engineering. Nonetheless, a significant gap persists between the current capabilities of FMs and LLMs in the CPS-IoT domain and the requirements they must meet to be viable for CPS-IoT applications. In this paper, we analyze and characterize this gap through a thorough examination of the state of the art and our research, which extends beyond it in various dimensions. Based on the results of our analysis and research, we identify essential desiderata that CPS-IoT domain-specific FMs and LLMs must satisfy to bridge this gap. We also propose actions by CPS-IoT researchers to collaborate in developing key community resources necessary for establishing FMs and LLMs as foundational tools for the next generation of CPS-IoT systems.